We eventually aim to run functional-annotation-informative analysis on fine mapping, and the tool suitable for this aim is PAINTOR. However, we first need to ensure that a basic analysis (no functional annotation) is reliable and results are consistent: we have several tools available, including PAINTOR, FINEMAP and CAVIAR. All the three tools belong to the class of fine-mapping methods that take GWAS summary stats, but require an estimate of LD pattern from a reference panel.
We selected FINEMAP for the basic analysis (no functional annotation), as its statistical model is similar to PAINTOR, but FINEMAP allows more controls and its output is more verbose (the number of causals and priors are explicitly specified; see also this discussion. Moreover, the FINEMAP authors recently published a review paper on running fine-mapping analysis on large-scale datasets (Benner et al. 2017). The 3rd and 4th paragraphs in Discussion section are particularly relevant for results reported here below.
We also showed that the size of the reference panel must scale with the GWAS sample size. Although a panel of 1,000 samples is adequate for a GWAS sample size of 10,000, a panel of 10,000 samples is needed for a GWAS sample size of 50,000. This result has important consequences for ongoing large meta-analysis efforts and biobank studies. We confirmed the result in three ways: empirically through simulations, analytically through likelihood evaluations, and theoretically through mathematical derivation.
In our analyses, we used FINEMAP software, which is based on a stochastic search algorithm. We verified that the results of FINEMAP were consistent across separate runs when the LD information provided a good approximation of the LD information from the original genotype data. We also observed that inaccurate LD information or mismatches in the allele coding between the reference panel and GWAS data could lead to an inflation of false positives and also to an inconsistency between the FINEMAP results across separate runs. Such problems typically manifest when the posterior probability of the number of causal variants concentrates on the maximum value possible and can therefore be detected by comparison of several FINEMAP runs that allow for increasing numbers of causal variants.
Analysis set up:
EUR of ~500 individualsThere are also alternative fine-mapping methods that don’t require LD information:
| Locus | MarkerName | Chr | Pos | cytoband | gene.context | Major | Minor | MAF | Effect | StdErr |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | rs10399947 | 1 | 150,861,960 | 1q21.3 | ARNT–[]–SETDB1 | G | A | 0.369 | -0.06 | 0.01 |
| 2 | rs10200279 | 2 | 202,170,655 | 2q33.1 | [ALS2CR12] | C | T | 0.287 | 0.07 | 0.01 |
| 3 | rs192481803 | 2 | 35,336,564 | 2p22.3 | [] | C | T | 0.007 | 0.65 | 0.12 |
| 4 | rs62246017 | 3 | 71,483,084 | 3p13 | FOXP1—[]—EIF4E3 | G | A | 0.325 | 0.07 | 0.01 |
| 5 | rs6791479 | 3 | 189,205,032 | 3q28 | TPRG1—[]—TP63 | A | T | 0.427 | 0.09 | 0.01 |
| 6 | rs35407 | 5 | 33,946,571 | 5p13.2 | [SLC45A2] | G | A | 0.042 | -0.47 | 0.04 |
| 7 | rs4455710 | 6 | 32,608,858 | 6p21.32 | [HLA-DQA1] | C | T | 0.368 | 0.14 | 0.02 |
| 8 | rs12203592 | 6 | 396,321 | 6p25.3 | [IRF4] | C | T | 0.166 | 0.44 | 0.01 |
| 9 | rs10944479 | 6 | 90,880,393 | 6q15 | [BACH2] | G | A | 0.189 | -0.09 | 0.02 |
| 10 | rs117132860 | 7 | 17,134,708 | 7p21.1 | AGR3—[]—AHR | G | A | 0.021 | 0.25 | 0.04 |
| 11 | rs7834300 | 8 | 116,611,632 | 8q23.3 | [TRPS1] | C | G | 0.438 | 0.07 | 0.01 |
| 12 | rs1325118 | 9 | 12,619,616 | 9p23 | []–TYRP1 | T | C | 0.304 | -0.07 | 0.01 |
| 13 | rs10810657 | 9 | 16,884,586 | 9p22.2 | BNC2–[]—CNTLN | A | T | 0.404 | -0.10 | 0.01 |
| 14 | rs57994353 | 9 | 139,356,987 | 9q34.3 | [SEC16A] | T | C | 0.284 | 0.09 | 0.01 |
| 15 | rs1126809 | 11 | 89,017,961 | 11q14.3 | [TYR] | G | A | 0.279 | 0.15 | 0.01 |
| 16 | rs74899442 | 11 | 115,890,279 | 11q23.3 | CADM1—[]—BUD13 | T | C | 0.004 | 0.60 | 0.11 |
| 17 | rs7939541 | 11 | 9,590,389 | 11p15.4 | ZNF143–[]-WEE1 | T | C | 0.410 | 0.08 | 0.01 |
| 18 | rs657187 | 12 | 52,898,985 | 12q13.13 | KRT6A–[]-KRT5 | A | G | 0.420 | -0.07 | 0.01 |
| 19 | rs721199 | 12 | 96,374,057 | 12q23.1 | [HAL] | C | T | 0.463 | -0.06 | 0.01 |
| 20 | rs1800407 | 15 | 28,230,318 | 15q13.1 | [OCA2] | C | T | 0.070 | 0.16 | 0.02 |
| 21 | rs1805007 | 16 | 89,986,117 | 16q24.3 | TCF25-[]-TUBB3 | C | T | 0.078 | 0.38 | 0.02 |
| 22 | rs6059655 | 20 | 32,665,748 | 20q11.22 | [RALY] | G | A | 0.077 | 0.25 | 0.02 |
All column names: MarkerName, Chr, Pos, cytoband, gene.context, Major, Minor, MAF, Effect, StdErr, P.value, HetISq, HetChiSq, HetDf, HetPVal.
| Group | The number of variats |
|---|---|
| Total | 24,707,509 |
| MAF > 1% | 10,792,565 |
| MAF <= 1% | 13,914,944 |
The range of MAF in the original summary stats file was from 0 to 1. How was MAF computed, using all or a subset of cohorts, and then used in GWAS? Is cohort-specific filtering by MAC better? How to filter by MAF/MAF in fine-mapping?
Plot description:
Notes:
Plot description:
Notes:
The indication of failed fine-mapping: the maximum posterior prob. is with the maximum number of causals. Also, the top SNPs based on Posterior Probability of SNP to be causal (rank_pp column) are far away from the top SNPs based on Z-scores (rank_z column).
- tables of results: `config`, `snp`, `ncausal`
- locus: 1
-- config:
-- input snps: 2370 fine-mapped + 310 missing Z/LD = 2680 in total
# A tibble: 10 x 4
rank config config_prob config_log10bf
<int> <chr> <dbl> <dbl>
1 1 rs10399947,rs12090215,rs78278355 0.167 23.6
2 2 rs1134067,rs12090215,rs78278355 0.130 23.5
3 3 rs6686064,rs12090215,rs78278355 0.113 23.5
# ... with 7 more rows
-- snp:
# A tibble: 2,680 x 6
snp rank_z rank_pp snp_prob snp_prob_cumsum snp_log10bf
<chr> <int> <int> <dbl> <dbl> <dbl>
1 rs12090215 19 1 1.00 0.333 13.2
2 rs78278355 1446 2 1.00 0.667 13.2
3 rs10399947 1 3 0.167 0.722 2.50
# ... with 2,677 more rows
-- 9 snps in 95% credible set: rs12090215, rs78278355, rs10399947, rs1134067, rs6686064, rs11204733, rs4970928, rs6660845, rs11587444...
- tables of results: `config`, `snp`, `ncausal`
- locus: 1
-- config:
-- input snps: 4148 fine-mapped + 406 missing Z/LD = 4554 in total
# A tibble: 10 x 4
rank config config_prob config_log10bf
<int> <chr> <dbl> <dbl>
1 1 rs72242061 0.203 3.92
2 2 rs72242061,rs61249550 0.191 7.51
3 3 rs60100018 0.121 3.69
# ... with 7 more rows
-- snp:
# A tibble: 4,554 x 6
snp rank_z rank_pp snp_prob snp_prob_cumsum snp_log10bf
<chr> <int> <int> <dbl> <dbl> <dbl>
1 rs61249550 3 1 0.475 0.347 3.40
2 rs72242061 1 2 0.395 0.635 3.26
3 rs60100018 2 3 0.184 0.769 2.79
# ... with 4,551 more rows
-- 4 snps in 95% credible set: rs61249550, rs72242061, rs60100018, rs144368575...
Barely, only numbers, tables and figures.
Benner, Christian, Aki S Havulinna, Marjo-Riitta Järvelin, Veikko Salomaa, Samuli Ripatti, and Matti Pirinen. 2017. “Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-Wide Association Studies.” The American Journal of Human Genetics 101 (4). Elsevier:539–51.
Mahajan, Anubha, Daniel Taliun, Matthias Thurner, Neil R Robertson, Jason M Torres, N William Rayner, Valgerdur Steinthorsdottir, et al. 2018. “Fine-Mapping of an Expanded Set of Type 2 Diabetes Loci to Single-Variant Resolution Using High-Density Imputation and Islet-Specific Epigenome Maps.” bioRxiv. Cold Spring Harbor Laboratory, 245506.